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(57) ABSTRACT 

Given a digital video clip, this invention describes how to 
eflSciently compute the motion vectors and motion trajectory 
of each identified video object for facilitating various com- 
monly encountered visual applications, such as video com- 
pression for transmission and archiving, security and sur- 
veillance monitoring, and search-by-query required in the 
Internet search engine or digital library. 
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Fig. 2(a) Large Diamond Search Pattern (LDSP) Fig. 2 (b) Small Diamond Search Pattern 
(SDSP) 



Fig. 3(a) Large Diamond Search Pat. (LDSP) Fig. 3(b) Small Diamond Search Pat(SDSP) 
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FIG.4 (a) Horizontal Hexagon Search Pattern Fig. 4(b) Vertical Hexagon Search Pattern 
fHHSPl rVHSP^ 



Fig. 4(c) hiterlaced Hexagon Search Pattern (MSP) 
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Fig. 8. An illustration of a systolic array organization 



05/17/2004. EAST Version: 1.4.1 



Patent Application Publication Aug. 22, 2002 Sheet 7 of 11 US 2002/0114394 Al 



m out 



r out 




m in 



Fig, 9. A typical processing element (PE) structure 



systolic array 



MUX 



r 

barrel 
shifter 2 



MUX 



MUX 



k k a 



20 



MUX 



barrel 
shifter 1 



21 



20 



•i 7 J 



21 



— j» 1 



24-modul6 memory 



Fig. 10. An illustration of barrel shifter architecture 



05/17/2004, EAST Version: 1.4.1 



Patent Application Publication Aug. 22, 2002 Sheet 8 of 11 US 2002/0114394 Al 



T=15 

CoilS 



t==4 t=3 t=2 t=l 



tr=0 



Systolic Array 





C22 




Co,2 






C3,I 




C1.I 




C|5.0 


C4^ 


C3,0 


C2.O 


Ci,o 



Fig. 1 1(a) Current-block data 
Systolic Array 

tr 





t=0 


ro,o 


ro.i 


ro.2 


ro^ 


ro,i5 




4c 




t=l 


ri.o 


ri.i 




ru 




ri.i6 


♦ 




t=2 


rv> 


r2.i 


ri2 


r2.3 


r2,i3 




12.17 




t=15 


ri5,o 


ri5,i 


ri5^ 


ri5,3 




ri5,i6 


ri5,i7 


slotl 


t=16 


* 


ri6,i 


ri6,2 




ri6,i5 


ri6,i6 


ri6,i7 


slot! 


t=17 


* 


* 




Ixu 


ri7.ij 




ri7.i7 




t=l8 


ri.-i 


ruo 


rj.i 


ri,2 


ri,i4 


* 






t=19 




r2.o 


r2,i 


r2^ 


... r2.M 


r2,i5 






t=20 


13,-1 


rs^ 


r3,i 




... r3.N 


r3,i5 


r3,i6 




t=33 


ri6,-i 


ri6,o 


ri6,i 


ri6.2 


ri6.u 


ri6.i5 


ri6.i6 


slot 1 


t=34 




ri7.o 


ri7.i 


ri7,2 


ri7,M 


ri7,is 


ri7.i6 


slot 2 


t=35 




* 




ri8.2 




ri«.i5 


risw 




t=36 


r2..2 


r2..i 


r2,o 


r2.i 


r2,i3 








t=37 


r3.-2 






r3.i 


... r3.i3 


r3,i4 


41 



Fig. 1 1(b) Reference-block data 
Fig. 11. Time scheduling of data: (a) Current-block data and (b) Reference-block data 
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(b) reference images. 
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decision-maidng process for identifying each MVs characteristic and the corresponding 
filtering action taken. 
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SYSTEM AND METHOD FOR MOTION VECTOR 
GENERATION AND ANALYSIS OF DIGITAL 
VIDEO CLIPS 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

[0001] This application claims priority in U.S. Provisional 
patent application No. 60/251,709 filed Dec. 6, 2000. 

TECHNICAL FIELD 

[0002] This invention relates to a system and method for 
performing fast generation of motion vectors from the 
digital video frames, as well as the motion trajectory extrac- 
tion of each identified video object based on the generated 
motion vectors. The process of generating motion vectors is 
often required in digital video compression for real-time 
visual communications and digital video archiving. The 
process of generating video-object motion trajectory is par- 
ticularly useful to the applications such as surveillance 
monitoring or searching over the distributed digital video 
databases for retrieving relevant video clips based on the 
query. 

[0003] In addition, the invented motion-trajectory match- 
ing scheme presented in the last part of the system is a 
generic solution for measuring the distance or degree of 
similarity between a pair of digital information sequences 
under comparison, and can be directly exploited for other 
kinds of data, such as handwriting curves, musical notes, or 
audio patterns. 

BACKGROUND 

[0004] In the era of multimedia communications, many 
technical challenges are incurred in the processing of digital 
video, due to its large amount of data involved and limited 
channel bandwidth in practice. For example, in teleconfer- 
encing or videophone application, how to transmit the 
digital video (say, acquired through digital camera) to the 
receiver in real time for visual communications requires 
compression process. As a result, the original amount of 
video data could be greatly reduced by discarding those 
redundant information while keeping those essential ones as 
much intact as possible in order to maintain the original 
video quality at the receiver side after reconstruction. Such 
video processing is so-called digital video coding. 

[0005] A basic method for compressing the amount of 
digital color video data for fitting into the bandwidth has 
been adopted by the Motion Pictiire Experts Group (NPEG), 
which produces MPEG-1, MPEG-2, and MPEG-4 compres- 
sion standards. MPEG achieves high data compression by 
utilizing Discrete Cosine Transform (DCT) technique for the 
intra-coded pictures (called I-firames) and motion estimation 
and compensation technique for the inter-coded pictures 
(called P-firames or B-frames). I -frames occur only every so 
often and are the least compressed frames; thus, yielding 
highest video quality and being used as reference anchor 
frames. The frames exist between the I -frames are P- frames 
and/or B- frames generated based on nearby I -frames and/or 
existing P-frames, The fast motion estimation for generating 
motion vectors is conducted for the P-fraraes and B-frames 
only. A typical frame structure could be 
IBBPBBPBBPBBPBB IBBPB . . . , being repeated so until 
the last video frame. The so-called Group of Picture (GOP) 



begins with an I -frame and ends on the frame that is 
proceeded by the next I-frame. In the above example, the 
size of GOP is 15. 

[0006] For generating motion vectors by performing 
motion estimation, each P-frame is partitioned into smaller 
blocks of pixel data; typically, 16x16 in size, called mac- 
rob lock (MB) in MPEG's jargon. Then, each MB will be 
shifted around its neighborhood on the previous I-frame or 
P-frame in order to find out the most resembled block within 
the imposed search range. Hence, only the motion vector of 
the most resembled block is recorded and used to represent 
the corresponding MB. The motion estimation for the 
B-frame will be conducted similarly but in both directions, 
forward prediction and backward prediction. 

[0007] Note that fast motion estimation methods can be 
direcdy exploited into all existing international video-cod- 
ing standards as well as any proprietary compressions sys- 
tem that adopts similar motion-compensated video coding 
methodology, as they all share exactly the same approach as 
above-mentioned in reducing temporal redundancy. Besides 
MPEG, another set of video coding standards, ITU's H.261, 
H.263, and H.26L, for teleconferencing or videophone 
applications also require such motion vector generation. 

[0008] Since the above-mentioned exhaustive search typi- 
cally requires large portion (about three-quarters) of total 
processing time consumed at a typical video encoder. Hence, 
fast algorithm is indispensable to the realization of real-time 
visual communications services. For that, we invented a 
scalable fast motion estimation technique for performing 
fast motion estimation. The scalability is useful to meet 
different requirements, such as implementation, delay, dis- 
tortion, computational load and robustness, while minimiz- 
ing the incidences of over-search (thus, increasing delay) or 
under-search (thus, might be increasing distortion). For 
example, in multimedia PC environment and with such 
scalable implementation, the user can have few choices in 
selecting the video quality mode for different visual com- 
munications applications, and even under different Internet 
traflSc situations and/or type of services. For example, in 
videophone, small delay in conversation is probably the 
most important requirement, for trading off reduced video 
quality. In another application scenario, a different fast 
motion estimation algorithm can be selected for creating a 
high- quality video email (if so desired) and to be sent later 
on. In this case, it is an off-line video application, from 
which the delay is not an issue. Another example in the 
so-called object-oriented based video coding where multiple 
video obje^ are identified, activating one of the block- 
matching motion estimation profiles can flexibly generate 
the motion vectors associated with each video object. 

[0009] After generating the M Vs, certain simple statistical 
measurements (say, mean and variance) of the MVs can be 
easily computed to yield a "content-complexity indicator" 
(in MPEG-4 video coding standard, a pneumonic, called 
f_code). Such indicator is useful to capture or snapshot each 
video frame in a summarized way. For example, based on 
the category information of the f_code, one can easily locate 
where are the duration of the shots that contain high-motion 
activity. 

[0010] The segmented regions that correspond to their 
associated video object respectively can form an alpha-plane 
mask, which is basically a binary mask for each video object 
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and for each individual frame, contrasting from the back- 
ground. Based on such alpha-plane information, the user can 
easily engage interactive hypermedia-like functionality with 
the video frames. For example, the user can click on any 
video object of interest at any time, say, a fast-moving racing 
car, then an information box will be popped up and provide 
some pre-stored information, such as the driver's name and 
age, past driving record and Grand Prizes awarded, and 
other relevant information. Note that each video object has 
its own associated information box, and its trajectory can be 
served as a reliable linkage of the alpha-plane masks of the 
same video object. 

[0011] The generated motion vectors as above-mentioned 
could be further processed for conducting intelligent con- 
teat-based indexing and retrieval. For example, how to 
search relevant multimedia materials (say, video clips) over 
large database and retrieve those containing identical or very 
similar content to that of the query would be very desirable 
to many applications, such as Internet search engine and 
digital library. Rather than relying on conventional 
approach, that is, keywords only, the so-called content-based 
search is fairly promising and effective in achieving the 
above-mentioned objective, since the ''content", like color, 
texture, shape, video object's motion trajectory, and so on, 
are often hard, and sometimes impossible, to describe in 
words. Therefore, the content-based search of multimedia 
materials is powerful and effective to facilitate this purpose. 
Obviously, it is not a trivial task, and in fact, needs a suite 
of intelligent processes. Besides other prominent features 
such as color, textures, shape, and so on, motion trajectory 
is another important key feature to digital video. In this 
invention, the content is specifically meant for the motion 
trajectory of video object identified from the given digital 
video clip. The remaining of this invention presents such 
method that is capable of automatically identifying multiple 
moving video objects and then simultaneously tracking 
them based on their motion trajectories generated, respec- 
tively. In this scenario, the user can impose a query by 
drawing a curve on computer, say, a parabola curve to 
signify a diver's diving action in order to search those video 
clips that contain such video content. 

[0012] Our invention essentially provides a fundamental 
core technology that mimics human being's capabilities on 
detecting moving video objects and tracking the objects* 
individual movement to a certain degree. A typical applica- 
tion that can benefit from this invention is as follows. In the 
environment of security surveillance, intruded moving 
objects can be automatically detected, and the trajectory, 
information can be used to steer the video cameras to follow 
the movement of the video objects while recording the 
incidences. Another application example can be found in 
digital video indexing and retrieval. 

SUMMARY OF THE INVENTION 

[0013] It is an object of the present invention to provide a 
scalable system that integrates several methods for perform- 
ing fast block-matching motion estimation to generate 
motion vectors for video compression and/or as the required 
input data to conduct other video analysis. 

[0014] It is a further object to provide implementation 
hardware architecture for reahzing the core (i.e., diamond 
search) of these fast block-matching motion estimation 
algorithms. 



[0015] It is a further object to provide a method that is 
simply based on the motion vectors information to search 
video database and identify those video clips containing 
video objects with best matching trajectories associated with 
that from a video clip under query or a trajectory curve 
drawn by user. 

[0016] It is a further object to provide a generic solution 
for measuring the degree of similarity of two chain-codes 
under comparison, where each chain-code is obtained from 
converting the original discretized information encountered 
in various applications, such as handwriting curves, musical 
notes, extracted audio tones, and so on. 

[0017] In summary, these and other objects of the present 
invention arc achieved by a system comprising means for 
generating motion vectors using any one of available fast 
block-matching motion estimation techniques organized and 
integrated in a scalable fashion to optimally meet the 
demand of various tradeoffs (such as, speed-up gain, com- 
plexity, video quality, etc.), means for realizing the core of 
these motion estimation techniques for hardware implemen- 
tation, means for smoothing noisy raw data, means for 
clustering the motion-vector data and validating the clusters 
so as to automatically detect the video objects, means for 
estimating motion trajectory of each detected video object, 
means for comparing each derived motion trajectory curve 
with respect to a database of motion trajectories, means for 
receiving a query trajectory and means for identifying video 
clips having video objects best matching the query motion 
trajectory. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0018] FIG. 1 is an overview of the system and method 

according to the present invention. 

[0019] FIGS. 2(a) and 2(b) are diamond search patterns 
for frame-predictive motion estimation in non-interlaced (or 
progressive-scanned) video. 

[0020] FIGS. 3(a) and 3(6) are diamond search patterns 
for field-predictive motion estimation in interlaced video. 

[0021] FIGS. 4(a) and 4(6) are hexagon search patterns for 
frame-predictive motion estimation in non-interlaced (or 
progressive-scanned) video. For interlaced video, the inter- 
laced hexagon search pattern as shown in 4(c) will be 
exploited for field- predictive motion estimation. 

[0022] FIG. 5 shows various types of regions of support 

(ROS); 

[0023] FIGS. 6(a) and 6(6) as an illustration of motion 
adaptive pattern (MAP) and adaptive rood pattern (ARP), 
respectively, for being used in the initial search in order to 
identify the best position for local refined search for each 
block. 

[0024] FIG. 7 is a system architecture for a 2-D systolic 
array. 

[0025] FIG. 8 is an illustration of a systolic array orga- 
nization. 

[0026] FIG. 9 is a typical processing element structure. 

[0027] FIG. 10 is an illustration of a barrel shifter archi- 
tecture. 
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[0028] FIGS. 11(a) and 11(b) show the time scheduling of 
current-block data and reference-block data, respectively. 

[0029] FIGS. 12(a) and 12(b) show the actual subscript 
positions with respect to the positions in the current and 
reference images, respectively. 

[0030] FIG. 13(a) is an architecture of the switching- 
based median filter; 13(^) is the hierarchical decision- 
making process for identifying each Mv's characteristic and 
the corresponding filtering action taken. 

[0031] FIG. 14 is the architecture of the maximum 
entropy fuzzy clustering (MEFC) to achieve unsupervised 
identification of clusters without any a priori assumption. 

[0032] FIG. 15 shows three main stages of the bidirec- 
tional motion tracking scheme, comprising bi-directional 
projection, recursive VO tracking and validating and Kal- 
man filter smoothing. 

DETAILED DESCRIPTION OF THE 
INVENTION 

[0033] The invention is best organized and described as 
comprising four parts. Parts A, B, C, and D, and the entire 
system consists of these four parts is illustrated in FIG. 1. 
Part A is the scalable fast block-matching motion estimation 
method for generating motion vectors ef&ciently, with con- 
siderations of multiple factors' tradeoff, such as computa- 
tional gain, complexity, video quality, system and applica- 
tion requirements. Part B presents a systolic-array 
implementation architecture for realizing the computation- 
ally-intensive core computation of the diamond search sys- 
tem described in Part A, from the hardware point of view. 
Part C is the method for the generation of motion trajectory 
of each detected video object, which consists of a scries of 
data operations: smoothing of motion vector field, formation 
of data clusters through clustering over the smoothed field, 
formation of video objects through validation process, and 
motion trajectory generation of each detected video object. 
Part D is the method for matching and recognition of 
chain-coded information, including hand-drawn curves, 
characters, symbols, or even musical notes and extracted 
audio tones. 

[0034] Part A. Scalable Fast Block-Matching Motion Esti- 
mation 

[0035] The invention in this part presents a scalable fast 
block-matching motion estimation system for the generation 
of motion vectors (MVs), which is indispensable in certain 
applications, such as video compression system for visual 
communications. Hie invention of scalability introduced in 
this fast motion estimation system can be realized on two 
aspects: search pattern scalability and search distance com- 
puting scalability. For the former, multiple block-matching 
motion estimation (BMME) algorithms are introduced, 
while for the latter, a simple downsampling process on pixel 
field would be effective. Individual or combined usage of the 
above-mentioned two scalabDity factors would dynamically 
control the generation of motion vectors flexibly, efficiently 
and optimally, while meeting important requirements and 
constraints, such as computational gain, coinplexity, quality- 
of-service (QoS), networking dynamics and behaviors, as 
well as inherent processing modes from the other parts of the 
parent system. 



[0036] As mentioned earlier, these BMME methods pre- 
sented here commonly share a common component, called 
small diamond search pattern (SDSP), in their local refined 
search. Furthermore, as digital video has two kinds: non- 
interlaced (or progressive-scanned) and interlaced, there- 
fore, new search patterns are needed for each of these two 
categories. The design of search patterns and their associated 
search strategy (or procedures) are instnuneatal to produce 
faster search and more accurate motion vectors. Based on 
the earlier-developed Diamond Search (DS) search patterns, 
which are used in frame-based motion estimation for non- 
interlaced video, as shown in FIG. 2, the counterparts of 
large diamond search pattern (LDSP) and small diamond 
search pattern (SDSP) for field-based motion estimation in 
interlaced video are shown in FIG. 3, respectively. With 
such design, the entire procedures of DS in the non-inter- 
laced case can be totally applied to interlaced video by using 
these search patterns shown in FIG. 3. In addition, the input 
video data do not need any extra data re-ordering processing, 
such as separating the entire video frame into two fields: 
even field and odd field. 

[0037] Another search pattern, called hexagon search pat- 
tern (as shown in FIG. 4), has less search points involved 
with possibly slight degradation on the video quality, com- 
pared with the above-mentioned diamond search patterns. In 
firame-predictive motion estimation for non-interlaced 
video, if more motion content is along the horizontal direc- 
tion, then the horizontal hexagon search pattern (HHSP) can 
be used; otherwise, applying vertical hexagon search pattern 
(VHSP). In field-predictive motion estimation for interfaced 
video, only one type of hexagon pattern, called interlaced 
hexagon search pattern (IHSP) will be used throughout for 
both even field and odd field, as this pattern has inherent 
interlaced structure (with one alternative line skipped for 
search) and fairly symmetrical. 

[0038] In many typical videos that contain fairly large 
motion content (e.g., sports) and/or peculiar motion content 
(e.g., cartoon, animation, video games), region-of-support 
(ROS) based prediction information and/or temporal infor- 
mation would be very helpfiil in producing more accurate 
motion vector results. Thus, more sophisticated fast block- 
matching motion estimation method are imperative and 
invented here. For that, new search patterns, called motion 
adaptive pattern (MAP) and adaptive rood pattern (ARP) are 
introduced. MAP is composed of several intelligently cho- 
sen search positions, which could be formed based on the 
positions from the origin (0, 0) of the current macroblock (or 
block, in a more general term and shall be interchangeably 
used, thereafter), the predicted motion vector of the chosen 
ROS as shown in FIG. 5 from the spatial domain, tempo- 
rally nearby motion vectors, and computed global motion 
vector (GMV). For example, diree motion vectors from the 
type B of FIG. 5, together with median-predicted motion 
vector, (0, 0) and GMV, can be the six search points of MAP 
in FIG. 6(a). Hence, MAP has a dynamic or inegular shape 
established for each macroblock. ARP, which can be viewed 
as a less inegular MAP, is also invented as shown in FIG. 
6{b), ARP has a rood shape with four arms constantly 
maintain at the directions in east, west, south and north, 
respectively. The length of rood-arm, F, is adaptively com- 
puted for each block initially, and the r is equal to the 
maximum of the dty-block distance of the median-predicted 
motion vector, based on the ROS chosen. For each block's 
motion vector generation, MAP (or ARP, if used) will be 
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used only once at the initial search stage to identify where 
is the most promising position to begin the local search from 
that position. Once the position is found, only SDSP will be 
used throughout the remaining search process until the 
motion vector is found. 

[0039] In our scalable block-matching fast motion estima- 
tion, each method or algorithm is called a profile. As 
mentioned earlier, all the profiles share either frame-based 
SDSP (FIG. 2(6)) or field-based SDSP (FIG. 3(6)), depend- 
ing on whether it is concerned with non-interlaced or 
interlaced video, respectively. In the following, search pat- 
tern scalable profiles are individually described, and they are 
directly applicable either to frame-based or field-based fast 
motion estimation. 

[0040] Profile 1 (or "Simple" Profile)— only the 
SDSP (FIG. 2(b) for frame-based and FIG. 3(6) for 
field-based) is used throughout the entire search. 
[0041] That is, in each search stage, the search 
point that yields the minimum matching error will 
be used as the search center of the new SDSP for 
the next search. Such process will be repeated 
imtil the center search point of SDSP yields the 
minimum matching error. 

[0042] Profile 2 (or "Basic" Profile)— either LDSP 
(FIGS. 2(a) for firame-based and 3(a) for field-based) 
or hexagon search patterns (FIGS. 4(a) and 4(6) for 
frame-based or FIG. 4(c) for field-based) is con- 
stantly used until the last step when the pattern's 
center position yields the minimum SAD. In such 
case, only SDSP (FIG. 2(6) for frame-based and 
FIG. 3(6) for field-based) will be used only once, 
and wherever yields the minimum SAD will be 
considered as the position of found motion vector for 
that macFoblock. Note that when LDSP is used, this 
is basically the DS. (In fact, we can view this Basic 
profile as two sub-profiles: Basic-Diamond profile 
and Basic-Hexagon profile.); 

[0043] Profile 3 (or "Pattern Adaptive" Profile)— 
either SDSP or LDSP is dynamically determined to 
be used for each block at its initial search. The 
decision of which one should be used can be made 
based on whether LDSP has been ever exploited 
during the search in the earlier-computed neighbor- 
ing block(s) incurred in the ROS. If no LDSP were 
used in the ROS, only SDSP will be used for the 
current block's motion vector generation; otherwise. 
Profile 2 will be activated. Alternatively, other 
simple decision logic (such as majority vote) could 
be practiced. 

[0044] Similarly, we can substitute LDSP by hexa- 
gon search patterns. In the non-interlaced case for 
performing frame-based motion estimation, we 
can further have two choices: HHSP and VHSP, as 
shown in FIGS. 4(a) and 4(6), respectively. The 
decision could depend on a certain simple crite- 
rion, such as whether the largest vector component 
in X- or y-direction incurred in the ROS is in the 
horizontal (using HHSP) or vertical direction 
(using VHSP). Furthermore, once the HHSP or 
VHSP is chosen, it can be applied throughout the 
search for the current block, or dynamic usage of 
one of these two patterns along the way, based on 
a simple decision logic. 



[0045] In the interlaced case for performing field- 
based motion estimation, similar search patterns 
(as shown in FIG. 4(c)), practice and criterion can 
be exploited straightforwardly. 

[0046] Hence, in fact, we can view this Pattern 
Adaptive profile comprising two sub -profiles: 
Diamond Adaptive profile and Hexagon Adaptive 
profile. 

[0047] Profile 4 (or "Main" Profile)— MAP (or ARP) 
is activated for the initial search and performed once 
only. The found position which yields the minimum 
matching error is viewed as the beginning position 
and for performing the remaining local search by 
using SDSP only; that is, enabling Simple Profile 
onwards until the motion vector is found. 

[0048] I n the above-mentioned, these profiles demonstrate 
an example of categorizing relevant fast motion estimation 
methods and put them in a scalable way for optimal usage. 
In addition, there are certain aspects diat are used in our 
invention with details as follows: 

[0049] Initialization: 

[0050] The motion vectors of the blocks outside the 
frame are taken to be equal to (0, 0). If the ROS of 
the current block is defined as the set of blocks to the 
left, above and above- right of the current block (i.e., 
type 6) for example, the corresponding motion vec- 
tors are denoted by MVj.jj, MVy.^ and MVj_i j^^, 
respectively. The search-point coordinates can be 
directly established based on the search patterns 
LDSP, SDSP, HHSP, VHSP, IHSP, MAP and ARP as 
shown in FIGS. 2-6, respectively. 

[0051] Furthermore, the global motion vector (GMV) 
is predicted from the motion vector field of the 
reference firame, and note that GMV is presented in 
MAP (and ARP) only if the global motion is present 
and detected in the reference frame. 

[0052] Determination of Search Range, sr: 

[0053] The mean (u^, //y) and standard deviation (o) 
of the motion vectors of the reference frame are 
computed. The search range (sr) for the current 
frame is given by 

5r-maxiinum of {16>((|;ix|, [«yP+3o)}. 

[0054] All search pattem's movement are restricted 
within the search window defined by the search 
range, sr. 

[0055] Detection of No-Motion Activity 

[0056] If the matching error for the current block at 
the position (0, 0) is less than a threshold T, then the 
block belongs to the no-motion activity region. In 
this case, the search ends here with the motion vector 
for the cmrent block equal to (0, 0). For that, we have 
two options in choosing the threshold: fixed thresh- 
old (we choose T»512, which is quite robust for all 
kinds of video while maintaining unnoticeable qual- 
ity degradation) and adaptive threshold described as 
follows. 

[0057] For adaptive threshold, a pre-judgement 
threshold map (PTM) for each video frame is estab- 
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lished. Assume that the sum of absolute difference 
(SAD) is the matching error criierion used here for 
illustration purpose. Let PTM(i, j, t) be the threshold 
for the current block (i, j) in the frame t, and S AD(i, 
j, t-1) be the prediction error of the same block 
position resulted in the previous frame, t-1. The 
PTM(i, j, t) can be established as 

PTM{U J\ t)-SAD{i, j, /-l)+6, 

[0058] where & is the adjustment parameter for 
adapting some tolerance, such as GMV and the 
prediction error fluctuation among the temporal 
neighboring blocks. 

[0059] Determination of Nonzero Motion Activities 

[0060] ITie ROS of the cxwrent block consists of its 
spatially and/or temporally adjacent blocks whose 
motion vectors are already determined in earlier 
stage. In our invention, the local motion vector field 
at the current block's position is defined as the set of 
motion vectors of the blocks belonging to the ROS of 
the current MB. The motion activity at the current 
block is defined in the present invention as a general 
function S of the motion vectors in its ROS. Let the 
evaluated numerical value of function f at the cur- 
rent block be L. We define function f as the maxi- 
mum of the city -block lengths in our invention. The 
motion activity at the current block is classified into 
different categories such as "low*', "mediiun", 
"high", based on the value of L. Let A and B be two 
numbers such that A<B. llien the procedure to 
obtain these categories is illustrated as follows: 

[0061] Motion activitysLow, if L less than or equal to 
A 

[0062] ^Medium, if L greater than A and, less than 
or equal to B 

[0063] -High, if L is greater than B. 

[0064] We choose A^l and B»2 in our invention for 
fuU-pel case. For half-pel and quarter-pel cases, 
parameters A and B can be scaled and chosen accord- 
ingly. 

[0065] Prediction of the Search Center 

[0066] The selection of search center could depend 
on the motion vector in the MAP of the current block 
that gives the minimum matching error is chosen as 
the search center. 

[0067] The selection of search center could also 
depend on the local motion activity at the current 
block position. If the motion activity is low or 
medium, the search center is the origin. Otherwise, 
the motion vector in the ROS of the current block 
that gives the minimum matching error is chosen as 
the search center. 

[0068] Search Distance Computing Scalability: Sub- 
Sampled Computation for Macroblock Distance Measure- 
ment 

[0069] At each search point visited, the distance of two 
macroblocks under measurement requires to be computed 
and used in ranking later on. To effectively reduce the 
computation, not all the pixels within the block needs to be 



counted in distance computation. Hence, sub-sampled com- 
putation (say, downsampled by a factor of two in both 
horizontal and vertical directions) can be practiced. Note 
that the relevant thresholds shall be adjusted accordingly, if 
effective. 

[0070] Updating "f_code" 

[0071] The "f_code" is a special code used in an 
international video coding standard MPEG-4 in its 
motion estimation part. The motion activity infor- 
mation computed as above-mentioned can be used to 
update the f_code, for the purpose of video indexing 
and other multimedia applications. Since the global 
motion activities information control the search 
range parameter, the search range can then update 
the f_code. 

[0072] While the above can be used in the present inven- 
tion, various changes can be made, for example, instead of 
the above-mentioned search patterns, any other symmetric 
search patterns might be used. Also, in determining the 
no-motion activity, instead of comparing the matching error 
of the current block with a fixed threshold, any other 
matching metric of the current block may be compared with 
a threshold. Likewise, when using adaptive threshold, 
exploiting a memory map of the previous frame for the 
current frame should be considered as a redundant practice. 
The function f might be any function of its member motion 
vectors. For example, the function may evaluate the maxi- 
mum of the lengths of the motion vectors or the area 
enclosed by the motion vectors, etc. The motion activity can 
be classified into more, or less, than the categories men- 
tioned, and the methods for selection of the search center 
and search strategies can be used in any other combinations 
other than those described above. All the above-mentioned 
can be directly applied to video 'frames' or 'fields' in the 
context. 

[0073] Part B. A Method and Apparatus of 2-D Systolic 
Array Implementation for Diamond Search Motion Estima- 
tion 

[0074] This part utilizes a systolic array architecture to 
implement the diamond search fast motion estimation so as 
to speed up the motion vector generation process. As illus- 
trated in FIG. 7, the proposed system architecture of this 
component comprises the following parts: (1) 2-D Systolic 
Array, (2) Memories, (3) Control Logic Unit, and (4) Com- 
parison Unit. 

[0075] The 2-D Systolic Array consists of a planar 
anangement of multiple processing elements (PEs), which 
perform the arithmetic computations to acquire the summa- 
tion of absolute difference (SAD) value for each checking 
point in the diamond search motion estimation method. The 
results are sent to the Comparison Unit to decide the final 
motion vector. Memory 1 and Memory 2 are employed to 
store the current-block data (Cur) and the reference-block 
data (Ref) to be compared, respectively. Control Logic Unit 
generates the memory addresses and manipulates the P£ 
operations in the systolic array. 

[0076] The 2-D systolic array diagram is shown in FIG. 8. 
The current-block data Cur, and the reference-block data Ref 
are inputted to the array from its left line and bottom line, 
respectively. The resulted SAD values are outputted from- 
the top line of the array. 
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[0077] The whole array consists of Px3 PE's, that arc 
arranged in P rows and 3 columns, where P is the width of 
the current block (in the following, P'=16 for demonstration). 
In each PE, the difference, the absolute -value operation and 
the summation are performed sequentially. FIG. 9 shows the 
block diagram of the PE structure, where c, r and m 
represents Cur, Ref and SAD, respectively. 

[0078] Memory 1 is composed of P modules where each 
module contains Q pixels, and Q is the height of the 
COTent-block (Q=16 for normal macro block). Memory 2 
has P+8 modules which contains all the reference-block data 
for the surrounding checking points of one large diamond 
search pattern (LDSP) as described above, so that no 
memory swap is required when the checking point is moved 
from one large diamond search (LDS) to another LDS. 

[0079] Each module contains 0+8 pixels, i.e., 24x8 bits 
for normal motion estimation. To supply the reference -block 
data into the boundary PE's, two barrel shifters are 
employed to interface the memory and the boundary PEs, 
wherein each shifter contains P+8 registers. With the aids of 
the shifters, the data from the corresponding modules are 
accessed by the left-shift or right-shift operations when the 
checking point to be processed is moved horizontally from 
one to another. The interface connections among the 
memory, the barrel shifters and the systolic array are shown 
in FIG. 10. 

[0080] The Control Logic Unit generates all the required 
control signals for the memories, the array and the compari- 
son logic. Accurate timing mechanism has to be provided to 
manipulate the whole data flow of the operations. 

[0081] FIG. 11 demonstrates the time scheduling for the 
current-block data and the reference-block data when the 
LDS is performed in the systolic array. The actual positions 
that the subscripts represent in the current and reference 
images are illustrated in FIG. 12. As shown in FIG. U, the 
current-block data are inputted into the array as a pipeline 
mode, whereas the reference-block data are supplied in a 
parallel manner. Notice that two idle cycles (slot 1 and slot 
2 in FIG. 11) are required in order to initiate the PE 
operations. 

[0082] The Comparison Unit compares the SAD results 
from the three PE columns individually and chooses the 
motion vector where the minimal SAD value occurs in the 
diamond search. The generated motion vector will be fed 
into the Control Logic Unit to guide the next search position 
and perform the above-mentioned steps. 

[0083] Part C. A Method for Extracting Motion Trajecto- 
ries of Moving Video Objects based on Motion Vectors 

[0084] The invention of extracting motion trajectories of 
moving video objects (VOs) based on macroblock motion 
vectors (MVs) comprises three phases: 1) Motion-vector 
field dcnoising, 2) Unsupervised clustering and 3) Bi-direc- 
tional motion tracking. 

[0085] 1). Motion- Vector Field Denoising 

[0086] Motion-vector field extracted directly from MPEG 
or H.26x bitsU^eams or generated using the techniques 
described in Part A is first filtered by a proposed noise 
adaptive soft-switching median (NASM) filter with archi- 
tecture as shown in FIG. 13(a). The NASM contains a 
switching mechanism steered by a three-level decision mak- 



ing process to classify each MV to be one of the four 
identified M V types as outlined in FIG. 13(6). Subsequently, 
appropriate filtering actions are accordingly invoked. 

[0087] 1.1 Soft-Switching Decision Scheme 

[0088] The first level involves the identification of true 
MVs. A standard vector median (S VM) filter with an adap- 
tive window size of ^oi^^di ^ applied to obtain a 
smoothed MV field. MV-wise differences between the 
original M V field and the smoothed MV field are computed. 
True MVs are identified to be the ones with much smaller 
differences. To be adaptive to different amount of irregular 
MVs, steps are repeated twice to estimate the percentage of 
irregularity q using a 7x7 SVM filter, and followed by 
selecting appropriate window size by referring to Table 1. 

[0089] Two optimal partition parameters pi and p^ are 
derived as two boundary positions. All MVs with falling 
onto this range are considered as true MVs. Denote 
Xq^x,^ . . , ^Xjj^ as the bin indices of the error histo- 
gram Aj. Each n^ (for i=0, 1, . . . , m) indicates the number 
of elements falling on the bin i. Parameters p^ is given by 



Pu - - 



i-O 



Z-(-T)^Z4'-y) 



(EQU 1) 



[0090] Similar analysis is repeated for the negative part of 
the distribution. Let bin indices x_„^x_„+i^ . . . <0, and ni 
represents the number of elements in bin i. Parameter pj is 
given by, 



X4-¥)*.i:4.-x) 



(EQU 2) 



[0091] The percentage of irregularities q is conservatively 
determined by subtracting the percentage of true MVs from 
the one-hundred percent. 

[0092] The second level involves the identification of 
isolated irregular MVs. Given a MV as the center MV within 
a Wi32xWp2 decision window, the membership values of its 
neighboring MV, j within the decision window are defined 
as 



(EQU 3) 



[0093] for -(Wd2-1)/2^s, t^(WD2-l)/2 and (s, t) (0 , 
0). Parameters d, ^ and d^.^ are the magnitude-wise differ- 
ences of MVg^ and MV^^ with respect to the center MV. 
Parameters u and v have the same value range as s and t, 
i.e., -(W^2-l)/2^u, v^(Wd2-1)/2 and (u, v)(0, 0 ). 
Starting with Wd2=3, the decision window repeatedly 
extends outwards by one unit in all the four window sides 
provided that the number of true MVs are less than (Wd2X 
Wd2)/2, or until Wp-W^j. That is, parameter Wpj is an odd 
integer, which satisfies S^Wj^^Wpi. 



05/17/2004, EAST Version: 1.4.1 



us 2002/0114394 Al 



7 



Aug. 22, 2002 



[0094] The mean of //^ j is used to divide the membership 
map //g t into two groups higher and lower-value groups, 
denoted by and Wtugh- decision rule for detecting an 
isolated irregular MV is defined by: 

[0095] (i) If ^,„^^i,jgi,/3, the center M V is claimed as an 
isolated irregular MV. 

[0096] (ii) If MiaJMhigfi^^f further discrimination at the 
third level will be required. 

[0097] The third level distinguishes the considered center 
MV as being a non-isolated irregular Mv and an edge MV. 
The algorithm respectively checks each side of the window 
boundary of ^02^^02 obtained in level two. If there is 
(are) closely correlated MV(s) to that of the center MV at 
any one of the four boundaries, the boundary will be 
subsequently extended by one pixel position to obtain an 
enlarged window. Denote as the number of "closely 
correlated MVs" within the enlarged window. The decision 
rule for discriminating non-isolated irregular Mv and an 
edge MV are: 

[0098] (i) If N^<Si„, the considered MV is a non- 
isolated irregular MV; otherwise. 



[0099] 
MV. 



(ii) If Nc>Sij„ the considered MV is an edge 



[0100] Threshold S;„ is conservatively defined to be half of 
the total number of uncorrupted MVs within the enlarged 
window. 

[0101] 1.2 Filtering Scheme 

[0102] For identified true MVs, they are unaltered in order 
to preserve the fine details of MV field. Standard vector 
median (SVM) and an invented fuzzy weighted vector 
median (FWVM) filters are exploited for irregular MVs and 
edge MVs, respectively. For the proposed FWVM filter, the 
fuzzy membership function /i. ^ computed earlier are re-used 
to determine the weights of true MVs within a WpxWp 
filtering window, llie weighting factors of those considered 
true MVs are defined to be 



, for (j,f)*(0,O), 
/I 



(EQU 4) 



[0103] where X^l/i^ ^+fi^ and fiJX is the weighting factor 
assigned to the center MV. Parameter ^u^ is optimally deter- 
mined by minimizing the output data variance such that the 
noise attenuation will be maximized, which is given by 



for (0.0). 



(EQU 5) 



[0104] 2. Maximum Entropy Fuzzy Clustering 

[0105] The NASM-filtered MVs are then grouped into an 
optimum number of cluster centers by our invented unsu- 
pervised maximiun entropy fuzzy clustering (MEFC) to 
segment MV field into homogeneous motion regions. FIG. 
14 shows the architecture of the MEFC. 



[0106] 2.1 Outer Loop 

[0107] The outer loop recursively increases the number of 
clusters c from 1 until it reaches to a pre-determined 
maximum value c„„, i.e., c=l, 2, . . . , c^. In each 
outer-loop iteration, a new cluster center will be initialized 
to split the largest cluster into two smaller clusters based on 
the measured fuzzy hypervolume. Denote the input MVs as 
{xJXieSR^ and i=l, 2, . . . , N} and the corresponding cluster 
centers as {cj|cjeiR^ and 2, . . . , c}. Initially, all data 
samples are considered belong to one dominant cluster. Hiat 
is, c»l and Ci^^^-Ij.i'^Xi/N. This dominant cluster is then 
optimally split into two according to 



€X maxd(Xf.cl*") . 



(EQU 6) 



[0108] In the subsequent iterations, each new cluster cen- 
ter is initialized according to 



e X max d(jCi, cui) and fiy > 



(EQU 7) 



[0109] where g is a pre-determined confident limit to 
claim a data sample to be strongly associated to the cluster 
center of the largest cluster C|^. 

[0110] 2.2 Inner Loop 

[0111] The inner loop recursively updates the membership 
values and the cluster centers to converge newly initialized 
and existing cluster centers to each respective new optimum 
position. The process involves (i) updating the membership 
values of all the data samples with respect to the cluster 
centers determined from the previous iteration, and (ii) 
computing the new cluster centers* positions based on the 
membership values computed in the current iteration. That 
is, denote U-[«i|]^***' in fuzzy membership domain M/**** 
and C«(ci]*^" in feature space SR*^. The inner process can be 
presented by recursively updating the following steps 

U-'FiQ, where F:SR"»-»Mf'*''% 

C=G(y), where G:M/"*'=—SR'='*V (EQU 8) 

[0112] These two steps alternately update each other until 
a convergence condition is met. That is, |U(t^.l)-U(t)|<i:, 
where x is a small value. 

[0113] The membership function of the MEFC is 
derived by maximizing the entropy constrained to minimiz- 
ing a fuzzy weighted distortion measurement. The member- 
ship function is derived to be 



Ay =7? ■ 



(EQU 9) 



[0114] Parameter is the Lagrange multiplier introduced 
in the derivations and are coined as ''discriminating factor**. 
The optimal value of for i«l, 2, . . . N is obtauned to be 
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l0g(£) 



(EQU 10) 



[0115] where e is a small value and dj^^^^^ is the distance 
of each X£ from its nearest cluster center Cp, i.e., d(X(, 
Cp)**^(*i' ^q) ^"1' 2. . . . , c and q^^p. 

[0116] For the updating expression for cluster centers Cj, it 
is given by 



M 

N ' 



(EQU 11) 



structured into three successive steps, involving bidirec- 
tional projection, motion trajectory extraction and Kalman- 
filter smoothing, as shown in FIG. 15. 

[0123] 3.1 Bi-Direclional Projection 

[0124] Validated VOs from the previous P-frame Ojj(n-np) 
and segmented regions Ri(n-i>n£) from future P-frame are 
bi-directionally projected onto current frame based on a 
second order kinematics model. Motion characteristics of 
the 0]c(n-np) and R|(n+n^ are assumed to be constant in the 
projection process. Thus, by forwardly projecting ©^(n-np) 
onto the current frame, the resulting displacement in the 
right and down directions could be respectively expressed 
by 

/),^v,Px«p, (EQU 15) 

/Ja'^v^PxHp. (EQU 16) 

[0125] The velocities of 0^(n-n^ in both directions are 
given by 



[0117] To identify the optimum number of clusters c, 
cluster validity is formulated in terms of intra-cluster 
compactness and inter-cluster separability to measure the 
clustering performance of each c value. The cluster's com- 
pactness is defined as 



= pixel/ frame, 

- .^^pixel/ frame, 
niitf 



(EQU 17) 
(EQU 18) 



(EQU 12) 



[0118] where S-I-.j^/N, FJ^j^:det(Fp]^ and F^. is the 
covariance matrix of jth cluster. For measuring inter-cluster 
separability, the principle of minimum entropy is exploited 
to be 



(EQU 13J 



[0119] Since we aim to maximize P^j and minimize Ej for 
cluster number c, we have the cluster validity measurement 
defined to be 



n 



(EQU 14) 



[0120] With the formulated cluster validity V^, this allows 
the evaluation of the clustering perfomance for each cluster 
number c. MVs will be segmented into an optimum number 
of regions since the optimal cluster number corresponds to 
the one that gives a maximum value of V^. 

[0121] 3. Bidirectional Motion Tracking for Motion Tra- 
jectory Extraction 

[0122] A bidirectional motion tracking process is then 
performed to form valid VOs from the segmented homoge- 
neous motion regions. The bi-directional motion tracking is 



[0126] where and are the means MV of Oj^(n-np), 
and n^cf is the number of frames from the reference frame. 
By the same principles, the displacement in the right and 
down directions for Ri(n-i-n^ in the future frame are 
expressed to be 

A**— v.W (EQU 19) 

Dd*'— v/x«f. (EQU 20) 

[0127] 3.2 Motion Trajectory Extraction 

[0128] Each segmented region obtained after the MEFC 
process may be a valid VO or a section of a valid VO, or 
even a large region that encompasses few VOs. To identify 
the semantics meaning conveyed by each segmented region 
(i.e., unconnected or connected region), our strategy is to 
identify various possible scenarios that have caused the 
generation of the segmented regions. 

[0129] For imconnected regions, let event A describes the 
interaction between segmented region Ri(n) of current frame 
and the projected VO(s) Oj^(n-np) from previous frame, 
given by 



^1, 

A3, 



(EQU 21) 



[0130] where 

[0131] A;|Bevent "Considered unconnected region 
overlaps with one projected VO's motio mask from 
the previous frame," 

[0132] A2=event "Considered unconnected region 
overlaps with multiple projected VOs' motion masks 
from the previous frame," 
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[0133] Aj^event "Considered unconnected region 
overlaps with none of the projected VO's motion 
naask from the previous frame/'aod 



{EQU22) 



[0134] where 

[0135] B^-event "Considered unconnected region 
overlaps with one projected homogeneous — motioa 
region from the future frame," 

[0136] B2«»ev6nl "Considered unconnected region 
overlaps with multiple projected homogeneous — 
motion region from the previous frame," 

[0137] Bjeevent "Considered unconnected region 
overlaps with none of the projected homogeneoiis — 
motion regions from the future frame." 

[0138] Actions to be taken for various combination of 
events (A, B) are concluded into four cases as tabulated in 
Table II. In Case 1, Ri(n) is mapped to Oit(n-np). In Case 2, 
Ri(n) is mapped to the projected VO that gives the minimum 
discrepancy in motion direction. In Case 3, R£(n) is identi- 
fied be a new VO. Region Rfn) is spurious noise in Case 4 
and subsequently to be discarded. 

[0139] For connected regions, they interact with the pro- 
jected 0^(n-np) from previous frame and R,(n+n£) from 
future frame in the same way as that of non-connected 
regions described by events C and D as follows 



c = 



(EQU23) 



"Both the considered connected — 
regions are associated to two different projected 
homogeneous — motion region from future frame," 

[0147] Dg^event "Both the considered connected — 
regions are associated to none of the homogeneous — 
motion region from future frame," 

[0148] Table III summarizes the actions to be taken for 
different combination of events (C, D). In Case 5, the 
connected regions are merged together to form a valid VO, 
i.e.. 



Oiin) = [jRi{n). 



[0149] In Case 6, the connected region are split into 
separate and independent VO which are mapped separately 
to different projected VO Oi^(n-np). In Case 7, connected 
regions are merged together to form a new VO. In Case 8, 
more information from future frames is required to further 
discriminate connected regions to be (i) different parts of a 
valid VO or (ii) independent valid VO which initially locate 
close to each other and will separate into independent 
regions eventually. In Case 9, region R^(n) is identified be 
spurious noise, llius, the region should be discarded as in 
Case 4. 

[0150] Checking of abrupt missing VO is also performed. 
If this happens, the VO*s mask from previous frame is 
forward projected onto current frame based on second order 
kinematics model to estimate the new position in the current 
frame. 

[0151] Subsequently, motion trajectories of the VOs are 
estimated by taking the centroid of each VO in each frame, 
i.e.. 



[0140] where 

[0141] Ci»event "Both the considered connected — 
regions are associated to the same projected VO's 
motion mask from previous frame," 

[0142] C2=event "Both the considered connected — 
regions are associated to two different projected 
VOs' naotion masks from previous frame," 

[0143] Cg-event "Both the considered connected — 
regions are associated to none of the projected VO's 
motion mask from previous frame." 



D = 



i>2. 
Z)3. 



(EQU24) 



(EQU25) 



[0152] where Cq-^.^ is the centroid of VO 0^(t) at frame t. 

[0153] 3.3 Kalman Filter Smoothing 

[0154] 111 the last stage, the obtained motion trajectories 
are smoothed by Kalman filtering. The following shows the 
formulation of the problem into state-space equations to be 
fed into iteration process of Kalman filtering. The trajectory 
of the target VO in two-dimensional Cartesian plane at time 
nT, where 1/T is the frame rate, is defined as 



(EQU26) 



[0144] where 

[0145] Dj^event "Both the considered connected — 
regions are associated to the same projected homo- 
geneous region from future frame," 



[0155] The displacement, velocity and acceleration vec- 
tors are defined as 



|(«+i)-|(«)+n(«)+V47^(«)+tip(«). 



(EQU 27) 
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(|)(«+l)-(i)(«)+Ti. in). 



(EQU28) 

(EQU 29) 



[0156] where t] (/i), r\y,(n) and r\J[n) are Ihe estimation 
errors, which individually possess Gaussian distribution. 
Define the state vector of the target VO as Xi(n)«[5i(n), li(n), 
(l)i(n)]'^ and the corresponding process error vector as 
Vi(n)«['npi(n), TiviW, i1ai(°)F» hence the state equation can 
be expressed as 

Xi(«+1)-/Xi(«)+Vi, (EOU 30) 

[0157] and the observation vector can be modeled as 

2i(n+l)-/&i(«+l>faj,(n+l). (EQU 31) 

[0158] where F-(l T Wf^; 0 1 T ; 0 0 1) and H-<1 0 0). 

[0159] With the derived state-space equation given by (30) 
and (31), the standard Kalman filter will be applied to give 
smoothed motion trajectories. 

[01'60] Part D Curve Recognition Using Evolutionary 
Alignment with Concave-Gap Penalty and Complex Scoring 
Matrix Technique 

[0161] In this part, we introduce a generic approach to 
perform alignment operation for given two curves under 
matching and quantitatively measm-ing their degree of simi- 
larity. The term of "curve" here is a generic representation 
or result of tracing the boundary of a shape, drawing a 
simple sketch or writing a character/symbol in one continu- 
ous stroke, or any such said information generation process/ 
operation. Note that all one -stroke handwriting curves are 
represented by a chosen chain-coding scheme first. The 
resulted chain codes as the strings are considered to be a 
special representation describing the curves individually. To 
match a pair of curves, their chain-code strings are aligned, 
compared, and measured. 

[0162] The evolutionary alignment algorithm is used to 
quantitatively measure the similarity between two curves 
described by their chain codes. If two curves are quite 
similar to each other, most of their chain codes will be 
matched, and the remaining chain codes can be altered for 
matching by inserting a code, deleting a code, or replacing 
a code by another. Each of the above-mentioned operation 
will incur a score for contributing the final matching score 
or similarity score (SS) as follows. 

[0163] Given two strings of curves, A=ai a2 . . . a,^ and 
B=b^ . . . bjsr, curve A can be matched by curve B by means 
of one of three possible operations: (1) deleting k consecu- 
tive codes, (2) inserting k consecutive codes, and (3) replac- 
ing a code by another. For each above-mentioned symbol 
operation, a corresponding scoring method Ls designed. For 
example, a positive cost for a perfect matching or an 
unchanged replacement can be imposed. The SS is the final 
score as the result of matching curve A against curve B by 
performing these three symbol operations. That is, the SS is 
a quantitative measurement in evaluating the degree of 
similarity between curves A and B. Two curves are consid- 
ered to be quite similar to each other, if the value of SS is 
high, and the higher the value, the larger the similarity. 

[0164] One constant or function for the cost of opening up 
a gap and one constant or function for the cost of inserting 
or deleting a code is used. For example, two negative 
constants, g and h, arc introduced to establish an affinc 
function: 



[0165] for the penalty incurred in inserting or deleting k 
symbols. Opening up a gap will cost score g, and each 
symbol inserted into or deleted from the gap will cost 
additional score h; thus, penalty score hk for k symbols. For 
the latter, it means that a set of k symbols from string A is 
deleted, or a set of k symbols from string B is inserted. 

[0166] Replacement costs are specified by a scoring 
matrix d(a{, bj), which gives the cost of replacing code a^ by 
code bj. Note that a code of A remains unchanged, if it is 
replaced by itself (i.e., when two codes hi and bj are perfectly 
matched) and gains the highest score. Usually, d(ai, bj)>0, if 
a£«bj., and d(ai, bj)^0, if ai<bj. For example, in the applica- 
tion of handwriting character recognition using for 8-direc- 
tional chain code encoding method: 



4« if o; = bj\ 
-3, Otherwise 



(2) 



TABLE 1 



Suggested window sizes for various estimated value of parameter q. 
Irregular MV density Suggested Wj,^ x W^j 



0% < q S 2% 


No fUteiing 


2% < q s J5% 


3x3 


15% < q ^ 30% 


5x5 


30% < q s 45% 


7x7 


45% < q ^ 60% 


9x9 


60% < q § 70% 


11 xll 



[0167] 


TABLE 2 




Actions to be taken for various combinations of events (A. BV 


Events 




B3 


Ax 

A3 


Case 1 Case 1 
Case 2 Case 2 
Case 3 Case 3 


Case 1 
Case 2 
Case 4 


[0168] 


TABLE 3 




Actions to be taken for various combinations < 


3f events fC D). 


Events 




Da 


C2 

C3 


Case 5 Case 5 

Case 6 Case 6 

Case 7 Case 8 


Cbse 5 

Case 6 

Case 9 



(1) 
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[0170] While preferred embodiments of the present inven- 
tion have been shown and described, it will be understood by 
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those skilled in the art that various changes or modifications 
can be made without varying from the scope of the present 
invention. 

1. A scalable fast block-matching motion estimation 
method for analyzing a plurality of video frames compris- 
ing: 

employing search pattern scalability (that is, at least one 
block-matching fast motion estimation algorithm pro- 
file selected) and/or search distance computing scal- 
ability (that is, through sub-sampled pixel positions in 
computing macroblock distance) in order to adapt to 
application requirements and system's processing 
modes; 

employing different search pattems designed for non- 
interlaced (or progressive- scanned) video; 

employing different search pattems designed for inter- 
laced video; 

performing an initial search for each macroblock in 
high-motion video and quality-demanded video appli- 
cations, in order to find the most promising search 
center for the remaining local search, by utilizing either 
adaptive rood pattern with a fixed rood- arm length 
adaptively determined or motion adaptive pattern, both 
based on available neighboring motion vectors from the 
spatial and/or temporal domains as well as, optionally, 
global motion vector computed; 

conducting local search for each macroblock using a 
small diamond pattern; 

conducting an adaptive threshold pre-judgment for detect- 
ing no-motion blocks in the video frames; 

computing global motion activity for each video frame; 

computing local motion activity for each macroblock; 

adapting a search range for each macroblock of each 
video frame; 

2. The method of claim 1 wherein the profiles as are 
selected from a group of profile options consisting of simple 
profile, basic profile, pattern adaptive profile, and main 
profile, being integrated together to form a scalable block- 
matching motion estimation architecture; 

3. The method of claim 1 wherein all the search pattems 
thus designed and presented in the associate figures for 
non-interlaced and interlaced video; furthermore, the pattern 
adaptive method could be used for any given pair of two 
different search patterns for non-interlaced video and for 
interlaced video, respectively. 

4. The method of claim 1 wherein the pattern adaptive 
profile is adaptively exploited based on the function of 
available neighboring motion vectors obtained firom the 
region-of-support of the current macroblock. 

5. The method of claim 1 wherein the adaptive rood 
pattem adaptively determined for each macroblock consists 
of a regular rood shape with four equal-length rood-arms 
plus one predicted search position, and an optional global 
motion vector; 

6. The method of claim 1 wherein the motion adaptive 
pattem adaptively determined for each macroblock consists 
of an irregularly formed search pattern based on available 
motion-vector positions obtained from the region-of-support 
of the current macroblock; 



7. The method of claim 1 wherein the sub-sampled pixels 
are achieved by regularly skipping the pixels in both hori- 
zontal and vertical directions for reduced computation of 
macroblock matching distance and thus increased speed-up 
of the motion-vector search process; 

8. The method of claim 1 wherein the pre-determined 
application requirements and system's processing modes 
are: bit rates and delay latency required by selected video 
applications, network traffic conditions, targeted quality of 
service, block sizes, selected motion-estimation prediction 
modes from the group consisting of normal prediction, 
advanced prediction, frame prediction, field prediction, fiill- 
pel, half-pel or quarter-pel prediction, selected video object 
planes having different priorities and quality requirements. 

9. The method of claim 1 wherein the detection of 
no-motion activity blocks is based on a pre-judgement 
threshold map, which is a function of sum of absolute 
difference plus a deviation factor. 

10. The method of claim 1 wherein the computation of the 
global motion activity for each video frame comprises: 

computing the mean and standard deviation of all of the 
motion vectors in a video frame, a grade of global 
motion activity being defined as the maximum of the 
absolute values of the components of the mean motion 
vector plus three times the standard deviation. 

11. The method of claim 1 wherein the computation of 
local motion activity at each block position comprises: 

determining city-block lengths of the motion vectors of 
those blocks that lie in the region-of-support of a 
current macroblock, a maximum of the city-blocks 
length used to measure a grade of local motion activity; 

classifying local motion activity at the current block 
position into one of the three classes: "low*', "medium" 
and "high", based on three non-overlapping ranges of 
the grade of local motion activity. 

12. The method of claim 1 wherein the profile can be 
chosen from the profile set based on the grade of local 
motion activity measured. 

13. The method of claim 1 further comprising adapting 
the search range at each block for motion estimation, 
wherein the search range of a current firame is adaptively 
determined based on the mean and variance information of 
a reference frame. 

14. The method of claim 1 wherein both the large dia- 
mond search pattern and small diamond search pattem are 
individually elongated in both vertical directions by one 
pixel each for matching interlace video data structure. 

15. The method of claim 1 further comprises of obtaining 
an f-code of each video object plane and updating the f-code 
for video indexing and retrieval. 

16. A block-matching motion estimation apparatus for use 
of a 2-D planar systolic array architecture, said apparatus 
comprising: 

means for simultaneously inputting each row of refer- 
ence-block data of three checking points in a large 
diamond and two checking points in a small diamond, 
into the systolic anray from a memory; 

means for inputting each row of current-block data into 
the systolic array from a memory; 

means for computing sum of absolute difference (SAD) 
with a planar systolic array of 3 columns and P rows, 
wherein P is the width of the current block; 
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means for comparison of the SAD values for nine check- 
ing points in a large diamond and four checking points 
in a small diamond to identify a final motion vector, 
where the smallest SAD occurred, and 

memory architecture for storing search-area data and 
current-block data. 

17. The block-matching motion estimation apparatus as 
claimed in claim 16, wherein said apparatus further com- 
prises: 

a plurality of processing elements arranged as a planar 
array of 3 columns and P rows, wherein P is a width of 
a current block; 

means for interconnection of the processing elements, 
wherein each processing element receives source data 
on the current-block from an adjacent processing ele- 
ment in the same row of the array; 

means for interconnection of the processing elements, 
wherein each processing element receives reference - 
block data on the checking points in the search window 
from an adjacent processing element in a diagonal 
position of the array, and 

means for interconnection of the processing elements, 
wherein each processing element sends an intermediate 
result of a SAD to an adjacent processing element in the 
same column of the array. 

18. The apparatus of claim 16 further comprising a circuit 
for pipeline inputting each row of current-block data into the 
systolic array from a memory, wherein said circuit com- 
prises: 

a plurality of delay registers for data to be inputted in each 
column except for the first one, wherein the number of 
the registers for each column are increased by one, 
counting from one. 

19. The apparatus of claim 18 further comprising a second 
circuit for simultaneously inputting each row of reference- 
block data of three checking points into the systolic array 
from a memory, wherein said second circuit comprises: 

means for connecting a plurality of memory modules to 
the systolic array, at least two barrel shifters being 
employed to select correct data for input; 

means for multiplexing by selectively inputting the ref- 
erence-block data from at least two barrel shifters, and 

a plurality of delay registers for receiving the data to be 
inputted in each column except the first three, wherein 
the° number of the^ registers for each column are 
increased by one, counting from one. 

20. The apparatus of claim 16 wherein the memory means 
for storing reference data of the checking points are P+8 
memory modules for storing reference data, wherein P is the 
width of the current block. 

21. The apparatus of claim 17 wherein the processing 
element is composed of one subtractor, one absolute opera- 
tor, one adder and three registers, and each intermediate 
result is outputted after the current and reference data are 
inputted. 

22. The apparatus of claim 19 wherein each barrel-shifter 
is composed of P+8 shift registers, wherein P is the width of 
the current block; data stored in the registers is shiftable 
from or to an adjacent neighborhood by a shift operation, 
one shift completed in one cycle; 



each register in each barrel-shifters being connected to 
one memory module for data input, and wherein P+2 
out of P+8 registers in each barrel-shifter is directly 
connected to the multiplex means. 

23. The apparatus of claim 19 wherein the multiplex 
means is comprised of P+2 2-to-l multiplexers. 

24. The apparatus of claim 20 wherein the memory 
module stores up to P pixels of the reference data in a search 
window, where P is the width of the current block, and an 
address pointer of the memory module can shift forwards 
and backwards. 

25. A method for using motion vectors to identify moving 
video objects in a plurality of video frames and the corre- 
sponding motion trajectories of the video objects compris- 
ing: 

using an adaptive nonlinear filtering technique to smooth 
a motion vectors field; 

clustering to segment motion vector field into homoge- 
neous motion regions that have coherent motion indi- 
vidually; and, 

using bi-directional motion tracking to form video objects 
and individually track each video object to obtain the 
motion trajectories thereof. 

26. The method of claim 25 wherein the filtering tech- 
nique comprises: 

classifying each motion vector to be one of four types, 
true motion vector, isolated irregular motion vector, 
non-isolated irregular motion vector and edge motion 

vector; 

selecting the filtering technique for each type identified, 
the filtering technique consisting of no filtering for a 
true motion vector, a standard vector median filter for 
an irregular motion vector and a fuzzy-weighted 
median filter for an edge motion vector. 

27. The method of claim 26 wherein the filtering is 
adjusted automatically in respond to the actual percentage of 
irregular motion vectors contained in the video frames. 

28. The method of claim 26 wherein filtering is carried out 
only if an estimated percentage of irregularities is more than 
T%, wherein T optionally being equal to 2. 

29. The method of claim 25 wherein for experimenting 
each cluster number c out of a pre-selected range, clustering 
comprises: 

sequentially splitting the largest cluster, in terms of fuzzy 
hyper volume measurement, in each outer loop itera- 
tion into two smaller clusters; 

recursively updating the membership functions in the 
inner loop and identifying the new cluster centers; and 

computing the cluster validity measurement VC. 

30. The method of claim 25 wherein bi-directional motion 
tracking comprises: 

performing a bi-directional projection of video objects 
from a previous frame and segmented regions from a 
future frame onto a current frame; 

conducting a validation, and merging and splitting on 
connected and non -connected motion regions to form 
valid video objects, and tracking the video objects; and, 

performing a Kalman filter smoothing to obtain smoothed 
motion trajectories for the video objects. 
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31. The method of claim 25 wherein the motion vector can 
be substituted by image pixel intensity or another type of 
input data, such as optical flow, vector data. 

32. A method for recognizing a handwriting curve or 
retrieving the most resembled information items from a 
database of storing such type of items comprising: 

performing chain coding for each sample stored in the 
database and the input sample to be recognized or 
matched against the database; 

measuring the degree of similarity score between the 
input chain code and each chain- coded information 
item stored in database, using evolutionary alignment; 
and 

recognizing the handwritten curve or identifying the most 
matched information items from the database. 

33. The method of claim 32 wherein evolutionary align- 
ment for each pair of chain-coded information sequences 
under matching comprises: 

exploiting concave-gap penalty for calculating the cost of 
opening up a gap for insertion and deletion of a number 
of codes 

exploiting complex scoring matrix for calculating the cost 
of replacing one code by another code. 



34. The method of claim 33 wherein the concave gap 
penalty uses one constant value or the value of a mathemati- 
cal function for the cost of opening up a gap and one 
constant value or the value of a mathematical function for 
the cost of inserting or deleting a number of consecutive, 
codes. 

35. The method of claim 33 wherein the complex scoring 
matrix uses a constant value or the value of a mathematical 
function for the cost of replacing one code by another code. 

36. The method of claim 32 wherein the handwritten 
curves of alphanumeric characters and symbols are pie- 
stored in database for the user-dependent handwriting appli- 
cation. 

37. The method of claim 32 further comprises resizing the 
received sample or database samples within the chosen 
fix-sized bounding box for curve -size normalization to 
increase recognition or retrieval accuracy. 

38. The method of claim 32 wherein the information items 
are applicable to musical notes and audio tones extracted 
from the musical signal, from which the variations of this 
information over time are treated as curves and then con- 
verted to chain-coded information sequences for matching 
and retrieval. 

* * * « * 
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